Pantheon: Exascale File System Search for Scientific Computing
نویسندگان
چکیده
Modern scientific computing generates petabytes of data in billions of files that must be managed. These files are often organized, by name, in a hierarchical directory tree common to most file systems. As the scale of data has increased, this has proven to be a poor method of file organization. Recent tools have allowed for users to navigate files based on file metadata attributes to provide more meaningful organization. In order to search this metadata, it is often stored on separate metadata servers. This solution has drawbacks though due to the multi-tiered architecture of many large scale storage solutions. As data is moved between various tiers of storage and/or modified, the overhead incurred for maintaining consistency between these tiers and the metadata server becomes very large. As scientific systems continue to push towards exascale, this problem will become more pronounced. A simpler option is to bypass the overhead of the metadata server and use the metadata storage inherent to the file system. This approach currently has few tools to perform operations at a large scale though. This paper introduces the prototype for Pantheon, a file system search tool designed to use the metadata storage within the file system itself, bypassing the overhead from metadata servers. Pantheon is also designed with the scientific community’s push towards exascale computing in mind. Pantheon combines hierarchical partitioning, query optimization, and indexing to perform efficient metadata searches over large scale file systems.
منابع مشابه
Exploring reliability of exascale systems through simulations
Exascale computers are predicted to emerge by the end of this decade with millions of nodes and billions of concurrent cores/threads. One of the most critical challenges for exascale computing is how to effectively and efficiently maintain the system reliability. Checkpointing is the state-of-theart technique for high-end computing system reliability that has proved to work well for current pet...
متن کاملDynamic Non-Hierarchical File Systems for Exascale Storage
Modern high-end computing (HEC) systems must manage petabytes of data stored in billions of files, yet current techniques for naming and managing files were developed 40 years ago for collections of thousands of files. HEC users are therefore forced to adapt their usage to fit an outdated file system model and interface, unsuitable for exascale systems. Attempts to enrich the interface, such as...
متن کاملTowards Supporting Data-Intensive Scientific Applications on Extreme-Scale High-Performance Computing Systems
Many believe that the state-of-the-art yet decades old high-performance computing (HPC) storage would not meet the I/O requirement of the emerging exascale mainly due to the segregation of compute and storage resources. Indeed, our simulation predicts, quantitatively, that the efficiency and availability would go towards zero as the system scales approach exascale. This work proposes a new arch...
متن کاملSimHEC: Understanding Application Efficiency at Exascales through Simulations
It is expected that our HEC system will enter exascale era in decade, which is one thousand times of performance as today’s system (petascale). In the mean time, many challenges also have been noticed and pointed out, as the size of HEC system increased without some dispensable improving on architecture of today’s HEC system, the systems could collapse at exascale, because the functionality wou...
متن کاملFusionFS: a distributed file system for large scale data-intensive computing
Today’s science is generating datasets that are increasing exponentially in both complexity and volume, making their analysis, archival, and sharing one of the grand challenges of the 21st century. Exascale computing, i.e. 10 FLOPS, is predicted to emerge by 2019 with current trends. Millions of nodes and billions of threads of execution, producing similarly large concurrent data accesses, are ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011